Everything about Likelihood Function totally explained
In
statistics, the
likelihood function (often simply the
likelihood) is a function of the
parameters of a
statistical model that plays a key role in
statistical inference. In non-technical usage, "likelihood" is a synonym for "
probability", but throughout this article only the technical definition is used. Informally, if "probability" allows us to predict unknown outcomes based on known parameters, then "likelihood" allows us to estimate unknown parameters based on known outcomes.
In a sense, likelihood works backwards from probability: given
B, we use the conditional probability P(
A|
B) to reason about
A, and given
A, we use the likelihood function
L(
B|
A) to reason about
B. This mode of reasoning is formalized in
Bayes' theorem:
» .
But this isn't the same as saying that the
probability of
pH = 0.5, given the observation, is 0.25.
To take an extreme case, on this basis we can say "the likelihood of
pH = 1 given the observation 'HH' is 1". But it's clearly not the case that the
probability of
pH = 1 given the observation is 1: the event 'HH' can occur for any
pH > 0 (and often does, in reality, for
pH roughly 0.5). If the
probability of
pH = 1 given the observation is 1, it means that
pH must and can only be equal 1 for event 'HH' to occur which is obviously not true.
The likelihood function isn't a
probability density function – for example, the integral of a likelihood function isn't in general 1. In this example, the integral of the likelihood density over the interval [0,1] in
pH is 1/3, demonstrating again that the likelihood density function can't be interpreted as a probability density function for
pH. On the other hand, given any particular value of
pH, for example
pH = 0.5, the integral of the probability density function over the domain of the
random variables
is 1.
Likelihoods that eliminate nuisance parameters
In many cases, the likelihood is a function of more than one parameter but interest focusses on the estimation of only one or at most a few of them, with the others being considered as
nuisance parameters. Several alternative ways have been developed to eliminate such nuisance parameters so that a likelihood can be written as a function of the parameter (or parameters) of interest only, the main ones being marginal, conditional and profile likelihoods.
These are useful because standard likelihood methods can become unreliable or fail entirely when there are many nuisance parameters (or the nuisance parameter is high-dimensional), particularly when the number of nuisance parameters is a substantial fraction of the number of observations and this fraction doesn't decrease when the sample size increases. They can also be used to derive closed-form formulae for statistical tests when direct use of maximum likelihood requires iterative numerical methods, and find application in some specialized topics such as
sequential analysis.
Conditional likelihood
Sometimes it's possible to find a sufficient statistic for the nuisance parameters, and conditioning on this statistic results in a likelihood which doesn't depend on the nuisance parameters.
One example occurs in 2×2 tables, where conditioning on all four marginal totals leads to a conditional likelihood based on the non-central
hypergeometric distribution. (This form of conditioning is also the basis for
Fisher's exact test.)
Marginal likelihood
Sometimes we can remove the nuisance parameters by considering a likelihood based on only part of the information in the data, for example by using the set of ranks rather than the numerical values. Another example occurs in linear
mixed models, where considering a likelihood for the residuals only after fitting the fixed effects leads to
residual maximum likelihood estimation of the variance components. (Note that there's a different meaning of
marginal likelihood in Bayesian inference).
Profile likelihood
It is often possible to write some parameters as functions of other parameters, thereby reducing the number of independent parameters.
(The function is the parameter value which maximises the likelihood given the value of the other parameters.)
This procedure is called concentration of the parameters and results in the concentrated likelihood function, also occasionally known as the maximized likelihood function, but most often called the profile likelihood function.
For example, consider a
regression analysis model with
normally distributed errors. The most likely value of the error
variance is the variance of the
residuals. The residuals depend on all other parameters. Hence the variance parameter can be written as a function of the other parameters.
Unlike conditional and marginal likelihoods, profile likelihood methods can always be used (even when the profile likelihood can't be written down explicitly). However, the profile likelihood isn't a true likelihood as it isn't based directly on a probability distribution and this leads to some less satisfactory properties. (Attempts have been made to improve this, resulting in modified profile likelihood.)
The idea of profile likelihood can also be used to compute
confidence intervals that often have better small-sample properties than those based on asymptotic
standard errors calculated from the full likelihood.
Historical remarks
Some early thoughts on likelihood were made in a book by
Thorvald N. Thiele published in
1889.
The first paper where the full idea of the "likelihood" appears was written by
R.A. Fisher in 1922: "On the mathematical foundations of theoretical statistics". In that paper, Fisher also uses the term "
method of maximum likelihood". Fisher argues against
inverse probability as a basis for statistical inferences, and instead proposes inferences based on likelihood functions.
Further Information
Get more info on 'Likelihood Function'.
|
External Link Exchanges
Do you know how hard it is to get a link from a large encyclopaedia? Well we're different and will prove it. To get a link from us just add the following HTML to your site on a relevant page:
<a href="http://likelihood_function.totallyexplained.com">Likelihood function Totally Explained</a>
Then simply click through this link from your web page. Our crawlers will verify your link, extract the title of your web page and instantly add a link back to it. If you like you can remove the words Totally Explained and embed the link in article text.
As long as your link remains in place, we'll keep our link to you right here. Please play fair - our crawlers are watching. Your site must be closely related to this one's topic. Any kind of spamming, dubious practises or removing the link will result in your link from us being dropped and, potentially, your whole site being banned. |